feat(duplicates): body_hash structural duplication detection#178
Conversation
Add symbols.body_hash (canonical body AST for function-shaped symbols), SCHEMA_VERSION 39, duplicates recipe, golden scenario, and agent rule row. Retire ast-hash-duplication plan into architecture/glossary/golden-queries.
🦋 Changeset detectedLatest commit: e5e289b The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Review limit reached
More reviews will be available in 40 minutes and 25 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
📝 WalkthroughWalkthroughThis PR adds structural duplicate detection to Codemap by computing a canonical SHA-256 hash of function bodies at index time, storing it in a new ChangesStructural Duplicate Detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Fix deferred correctness/perf items: return-position Literal:nullish for null/undefined/void 0/bare return; void 0 only (not all void); FD symbol index via markArrowSymbol at push time; docs parity and regression tests.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/golden-queries.md`:
- Line 83: Update the documentation for the `duplicates` query and
`symbols.body_hash` population: replace the narrow phrase "named functions,
arrows, and class methods" with the precise contract "function-shaped symbols
(function, method, getter, setter)" and note the `body_hash` is set at index for
those symbol kinds when `body_line_count >= 2`; ensure the term "function-shaped
symbols" is used consistently in the `duplicates` description and any other
mentions of `symbols.body_hash` so readers understand getter/setter coverage.
In `@templates/recipes/duplicates.sql`:
- Around line 5-10: Duplicate grouping is done before applying scope filters;
move scope filters into the input set to the grouping (e.g., build a
filtered_symbols CTE or subquery selecting from symbols with the params filters
such as path_prefix and min_body_lines) so that the GROUP BY on body_hash and
the HAVING (min_count from params) operate only on the scoped rows. Locate the
symbols table usage and replace the direct GROUP BY on symbols.body_hash with
grouping over the filtered_symbols result (preserving references to body_hash
and params), and ensure any later joins or WHEREs that reference
path_prefix/min_body_lines are applied inside that filtered source rather than
after aggregation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0a85f5a-1c09-4fca-8725-6a79c6791a24
📒 Files selected for processing (30)
.changeset/ast-hash-duplication.mddocs/architecture.mddocs/glossary.mddocs/golden-queries.mddocs/plans/ast-hash-duplication.mddocs/roadmap.mdfixtures/CAPABILITIES.jsonfixtures/golden/minimal/barrel-files.jsonfixtures/golden/minimal/coverage-confirmed-dead-no-ingest.jsonfixtures/golden/minimal/coverage-confirmed-dead.jsonfixtures/golden/minimal/duplicates.jsonfixtures/golden/minimal/files-count.jsonfixtures/golden/minimal/files-hashes.jsonfixtures/golden/minimal/index-summary.jsonfixtures/golden/minimal/index-table-stats.jsonfixtures/golden/minimal/refactor-risk-ranking.jsonfixtures/golden/minimal/source-fts-row-count.jsonfixtures/golden/minimal/unimported-exports.jsonfixtures/golden/minimal/untested-and-dead.jsonfixtures/golden/minimal/worst-covered-exports.jsonfixtures/golden/scenarios.jsonfixtures/minimal/src/bench/duplicate-body-a.tsfixtures/minimal/src/bench/duplicate-body-b.tssrc/db.tssrc/extractors/body-hash.test.tssrc/extractors/body-hash.tssrc/parser.tstemplates/agent-content/rule/00-full.mdtemplates/recipes/duplicates.mdtemplates/recipes/duplicates.sql
💤 Files with no reviewable changes (1)
- docs/plans/ast-hash-duplication.md
Apply path_prefix and min_body_lines before duplicate_count aggregation so scoped queries report accurate group sizes. Align golden-queries wording with function-shaped symbol contract (getter/setter included).
Wire duplication.body-hash agent-eval probe; fix spike-crap tier count after fixture symbols; add duplicates-recipe-scope regression tests; setter/nullish-scope unit tests; consumer doc caveats (LIMIT, async/gen).
Summary
symbols.body_hashat index time (canonical function-body AST: identifiers →$id, literals → kind) for function-shaped symbols, withSCHEMA_VERSION39 and partial indexidx_symbols_body_hash.duplicatesrecipe (per-symbol rows +duplicate_countvia CTE) and agent rule trigger for structural duplicate discovery.docs/plans/ast-hash-duplication.mdinto architecture, glossary, golden-queries, and roadmap; includes changeset for minor release.Test plan
bun test src/extractors/body-hash.test.ts(11 tests: FD, arrows, methods, getters, templates)bun run typecheckCODEMAP_ROOT=fixtures/minimal bun scripts/query-golden.ts(includes newduplicatesscenario)bun test scripts/query-golden-coverage-matrix.test.mjsSummary by CodeRabbit
New Features
duplicatesrecipe for identifying identical code bodies across filesDocumentation
Fixtures